Project Description¶

Welcome to New York City, one of the most visited cities in the world. Because many travelers come here, there are many Airbnb listings available for short or long stays. In this project, we will explore the New York Airbnb market by working with data stored in different file formats such as .csv, .tsv, and .xlsx.

CSV, TSV, and Excel files are common ways to store data. We have three files containing information about Airbnb listings from 2019:

  • airbnb_price.csv
    This CSV file contains details about listing prices and locations.

    • listing_id: a unique identifier for each listing
    • price: the nightly price of the listing in US dollars
    • nbhood_full: the name of the borough and neighborhood where the listing is located
  • airbnb_room_type.xlsx
    This Excel file includes information about the listing descriptions and room types.

    • listing_id: a unique identifier for each listing
    • description: a text description of the listing
    • room_type: Airbnb offers three room types — shared rooms, private rooms, and entire homes or apartments
  • airbnb_last_review.tsv
    This TSV file holds data about hosts and the dates of their last reviews.

    • listing_id: a unique identifier for each listing
    • host_name: the name of the host for the listing
    • last_review: the date when the listing was last reviewed

Our goal is to practice importing, cleaning, and manipulating data from these different formats. We will then analyze and report insights that could help a real estate start-up better understand the Airbnb arket in New York City.

In [1]:
# Import necessary packages
import pandas as pd
import numpy as np

# Import CSV for prices
airbnb_price = pd.read_csv('airbnb_price.csv')

# Import Excel file for room types
airbnb_room_type = pd.read_excel('airbnb_room_type.xlsx')

# Import TSV for review dates
airbnb_last_review = pd.read_csv('airbnb_last_review.tsv', sep='\t')

# Join the three data frames together into one
listings = pd.merge(airbnb_price, airbnb_room_type, on='listing_id')
listings = pd.merge(listings, airbnb_last_review, on='listing_id')
listings.head()
Out[1]:
listing_id price nbhood_full description room_type host_name last_review
0 2595 225 dollars Manhattan, Midtown Skylit Midtown Castle Entire home/apt Jennifer May 21 2019
1 3831 89 dollars Brooklyn, Clinton Hill Cozy Entire Floor of Brownstone Entire home/apt LisaRoxanne July 05 2019
2 5099 200 dollars Manhattan, Murray Hill Large Cozy 1 BR Apartment In Midtown East Entire home/apt Chris June 22 2019
3 5178 79 dollars Manhattan, Hell's Kitchen Large Furnished Room Near B'way private room Shunichi June 24 2019
4 5238 150 dollars Manhattan, Chinatown Cute & Cozy Lower East Side 1 bdrm Entire home/apt Ben June 09 2019

What are the dates of the earliest and most recent reviews?¶

In [2]:
# convert "last_review" to datetime type
listings['last_review_date'] = pd.to_datetime(listings['last_review'], format='%B %d %Y')
first_reviewed = listings['last_review_date'].min()
last_reviewed = listings['last_review_date'].max()

print(first_reviewed)
print(last_reviewed)
2019-01-01 00:00:00
2019-07-09 00:00:00

How many of the listings are private rooms?¶

In [3]:
# Since there are differences in capitalization, making capitalization consistent
listings['room_type'] = listings['room_type'].str.lower()
private_room_count = listings[listings['room_type'] == 'private room'].shape[0]

private_room_count
Out[3]:
11356

What is the average listing price?¶

In [4]:
# To convert price to numeric, removing " dollars" from each value
listings['price_clean'] = listings['price'].str.replace(' dollars', '').astype(float)
avg_price = listings['price_clean'].mean()

avg_price
Out[4]:
141.7779364512674

Combine the new variables into one DataFrame called review_dates with four columns in the following order: first_reviewed, last_reviewed, nb_private_rooms, and avg_price. The DataFrame should only contain one row of values.¶

In [6]:
review_dates = pd.DataFrame({
    'first_reviewed': [first_reviewed],
    'last_reviewed': [last_reviewed],
    'nb_private_rooms': [private_room_count],
    'avg_price': [round(avg_price, 2)]
})

review_dates
Out[6]:
first_reviewed last_reviewed nb_private_rooms avg_price
0 2019-01-01 2019-07-09 11356 141.78